ggplot2
(continued)Note: There are often multiple ways to answer each question.
Load the ggplot2
and fueleconomy
packages, as well as the vehicles
dataset. Run the code below to extract just the first 1,000 rows of the dataset.
library(ggplot2)
library(fueleconomy)
data(vehicles)
vehicles <- vehicles[1:2000, ]
hwy
vs. cty
. Give axis titles and a main title to the plot to make it more interpretable.ggplot(vehicles, aes(x = cty, y = hwy)) +
geom_point() +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg")
cyl
value. Also reduce the alpha of the points to an appropriate level and introduce jitter.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg")
cyl
is in its own plot.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg") +
facet_wrap(~ cyl)
cyl
values compare with each other, a lot of the plot space is wasted. Modify the plot so that each little plot has its own x and y scale. (Hint: This website might be helpful.)ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg") +
facet_wrap(~ cyl, scales = "free")
fuel
there are in the dataset. (Use the geom_bar
geom.) Change the theme to ggplot
’s black and white theme.ggplot(vehicles, aes(x = fuel)) +
geom_bar() +
theme_bw()
displ
for each value of drive
. Overlay that with a scatterplot of displ
vs. drive
(with jitter and alpha). How does the scatterplot give the reader more information?ggplot(vehicles, aes(x = drive, y = displ)) +
geom_violin() +
geom_jitter(alpha = 0.2)
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).
## Warning: Removed 2 rows containing missing values (geom_point).
The scatterplot shows us how many observations there really are for each value of drive
. The violin plot doesn’t convey that information well. (For example, there are very few observations with 2-Wheel Drive.)
hwy
against year
with alpha value 0.5. Add a geom_smooth
layer with option method = "lm"
and without the SE bands.ggplot(vehicles, aes(x = year, y = hwy)) +
geom_jitter(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE)
fuel
. Also, change the theme to ggplot
’s minimal theme and move the legend to the bottom of the plot. What happens to the geom_smooth
layer?ggplot(vehicles, aes(x = year, y = hwy, col = fuel)) +
geom_jitter(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
theme(legend.position = "bottom")
The geom_smooth
layer gives a separate smoothed estimate for each value of fuel
.
hwy
vs. cty
, with the color of the point depending on year
. Change the color scale to “Spectral”. Do you see a trend?ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
geom_jitter() +
scale_color_distiller(palette = "Spectral")
As time goes on, we tend to see higher values of both highway and city mpg. This makes sense, since we expect the cars to be more fuel-efficient as time goes on.
ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
geom_jitter() +
scale_color_distiller(palette = "Reds") +
labs(title = "Plot of highway mpg vs. city mpg") +
theme_bw() +
theme(plot.title = element_text(size = rel(1.5), face = "bold", hjust = 0.5))